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Adaptive Multivariate Global Testing 

Giorgos Minas, John A.D. Aston, and Nigel Stallard 



We present a methodology for dealing with recent challenges in testing global hypotheses using multivariate observations. The proposed 
tests target situations, often arising in emerging applications of neuroimaging, where the sample size n is relatively small compared with the 
observations' dimension K. We employ adaptive designs allowing for sequential modifications of the test statistics adapting to accumulated 
data. The adaptations are optimal in the sense of maximizing the predictive power of the test at each interim analysis while still controlling 
the Type I error. Optimality is obtained by a general result applicable to typical adaptive design settings. Further, we prove that the potentially 
high-dimensional design space of the tests can be reduced to a low-dimensional projection space enabling us to perform simpler power 
analysis studies, including comparisons to alternative tests. We illustrate the substantial improvement in efficiency that the proposed tests can 
make over standard tests, especially in the case of n smaller or slightly larger than K. The methods are also studied empirically using both 
simulated data and data from an EEG study, where the use of prior knowledge substantially increases the power of the test. Supplementary 
materials for this article are available online. 
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1. INTRODUCTION 

In this work, we develop novel methodology for dealing with 
recent challenges in testing global hypotheses using multivari- 
ate observations. The classical approach for studying the prob- 
lem, Hotelling's T 2 -test (Hotelling 1931), can efficiently detect 
effects in every direction of the multivariate space when the 
sample size n is sufficiently large. However, in settings where n 
approaches or becomes smaller than the observation dimension 
K, r 2 -test becomes respectively inefficient and inapplicable. 
This cost in efficiency, paid due to the need to search in every 
direction of the alternative space, seems particularly wasteful 
(but avoidable), if prior knowledge about the direction of the 
effect is available. Motivated by the latter settings, often arising 
in the increasingly important field of neuroimaging, we develop 
tests which are powerful in studies with n ^> K, but can also be 
efficient in situations where n is close to or smaller than K. 

The proposed tests employ adaptive designs allowing for se- 
quential modifications of the test statistic based on accumulated 
data. Such adaptive designs have straightforward but not ex- 
clusive application in clinical trials. A large literature on the 
subject (e.g., Bauer and Kohne 1994; Proschan and Hunsberger 
1995; Lehmacher and Wassmer 1999; Miiller and Schiifer 2001 ; 
Brannath, Posch, and Bauer 2002; Liu, Proschan, and Pledger 
2002; Brannath, Gutjahr, and Bauer 2012) deals with the deriva- 
tion of flexible procedures that allow for adaptations of the initial 
design without inflation of the Type 1 error rate. Some sequen- 
tial designs (e.g., Denne and Jennison 2000) also permit design 
adaptations, but the latter need to be preplanned and indepen- 
dent of the interim test statistics. Adaptive designs are employed 
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for many kinds of adaptations including sample size recalcula- 
tion (Lehmacher and Wassmer 1999; Mehta and Pocock 2011), 
treatment or hypothesis selection (Kimani, Stallard, and Hutton 
2009), and sample allocation to treatments (Zhu and Hu 2010). 
Despite the fact that many authors have stressed the potential 
for test statistic adaptation (e.g., Bauer and Kohne 1994; Bretz 
et al. 2009), there are only a few papers on the subject (Lang, 
Auterith, and Bauer 2000; Kieser, Schneider, and Friede 2002). 
Furthermore, various approaches for adaptive designs in multi- 
ple testing are available (see Bretz et al. 2009). These methods 
can efficiently detect few independently significant outcomes. 
However, it is well known that standard multiple testing meth- 
ods (e.g., Bonferroni and Simes tests) become conservative and 
inefficient in settings, such as the typical neuroimaging studies, 
where strong dependencies and a large number of outcomes are 
present (D'Agostino and Russell 2005). 

Similarly to the tests developed by O'Brien (1984), Lauter, 
Glimm, and Kropf (1998), and Minas et al. (2012), the proposed 
tests are based on linear combinations of the observation vec- 
tors. The crucial element in this approach is the weighting vector 
reducing the observation vectors to the scalar linear combina- 
tions. This defines the direction in which we decide to search 
for effects, and it can substantially affect both Type I and Type 
II error rate of the tests. O'Brien proposed deriving the weight- 
ing vectors under the assumption of uniform mean structure, 
while Lauter et al. showed that if the weighting vector is derived 
from the observation sums of products matrix, the Type I error 
is controlled and high power is attained under certain factorial 
structures. On the other hand, the tests in Minas et al. (2012) 
can attain high power levels independently of the mean and co- 
variance structure but a part of the sample is used in a separate 
pilot study to learn the weighting vector. 

In this work, linear combination test statistics, initially con- 
structed using weighting vectors derived from prior information, 
are sequentially updated based on observed data at subsequent 
interim analyses in an adaptive design. Early termination of the 
study (due to early acceptance or rejection of the null hypothesis 
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at an interim analyses) which is often of interest, especially in 
clinical trials, is also possible within our approach. Our meth- 
ods provide a formal framework for optimally using prior in- 
formation in constructing test statistics as has been suggested, 
but not implemented, in earlier papers (Pocock, Geller, and 
Tsiatis 1987; Lauter, Glimm, and Kropf 1996; Tang, Gnecco, 
and Geller 1989a). 

While our tests maintain the two prime targets of adaptive de- 
signs, namely flexibility and Type I error control (Brannath et al. 
2012), we also focus on attaining power optimality. Specifically, 
we employ the methods proposed by Spiegelhalter, Abrams, and 
Myles (2002) to derive optimal tests maximizing the predictive 
power of the test at each interim analysis. The methods of proofs 
can be useful in deriving optimal adaptive designs in more gen- 
eral settings. As we illustrate in Section 3, the results of Theorem 
3.1 could be used to derive optimal designs for regression anal- 
ysis for example. 

The power performance of a multivariate test, lying in a pos- 
sibly high-dimensional design space, can be hard to illustrate 
and interpret. Therefore, power analysis of multivariate tests is 
typically restricted to a limited part of the design space. We 
tackle this problem by reexpressing the C(^ 2 )-dimensional de- 
sign space as a lower dimensional easily interpretable space that 
is still sufficient to determine power. The crucial step here is to 
identify a measure quantifying the angular distance between the 
selected weighting vector and the optimal weighting vector and 
proving its sufficiency in computing power. These results pro- 
vide wide understanding of the behavior of linear combination 
tests and allow us to extend earlier work on power analysis of 
single stage (Pocock, Geller, and Tsiatis 1987; Follmann 1996; 
Logan and Tamhane 2004) and sequential (Tang, Gnecco, and 
Geller 1989b; Tang, Geller, and Pocock 1993) linear combi- 
nation tests, beyond low-dimensional observations or specific 
mean and covariance structures. 

We perform extensive simulation studies to explore and com- 
pare the proposed and alternative single stage and sequential 
procedures throughout the design space. We show that linear 
combination tests outperform Hotelling's T 2 -tests for the latter 
angular distance being below a certain value which, especially 
for sample sizes close to K, can be rather high. We further show 
that, in contrast to linear combination tests, such as O'Brien 
OLS test, with fixed weighting vectors, the adaptive linear com- 
bination tests can attain high power levels even in situations 
where the weighting vector selected at the planning stage is or- 
thogonal to the true optimal (where, of course, a nonadaptive 
test would have zero power asymptotically). The advantages 
of the proposed tests are also illustrated through a real example 
taken from an EEG depression study (Lauter, Glimm, and Kropf 
1996). 

This article is organized as follows. In Section 2, we for- 
mulate the class of linear combination tests while in Section 
3 we derive optimal, with respect to power, tests in this class. 
In Section 4, we present the results allowing us to characterize 
power based on low-dimensional summaries of the design pa- 
rameters. In Section 5, we discuss the main results of extensive 
simulation studies performed using the latter results to explore 
power and compare the proposed tests with alternative global 
tests under various conditions, while in Section 6 we apply our 
procedures to an EEG depression study. Section 7 includes a 



short summary and discussion of the obtained results. Technical 
lemmas and proofs are provided in Supplementary Material A, 
while further illustrations of the simulation studies are provided 
in Supplementary Material B. 

2. FORMULATION OF /-STAGE LINEAR 
COMBINATION TESTS 

In the following, we formulate /-stage linear combination z 
and f-tests and define their error rate functions. We assume that 
the A"-dimensional observation vectors Y ij — (Yij\, . . . , Yjjx) 7 
of subjects i — 1,2, ... ,rij, participating in stage j, j — 
1, 2, . . . , /, of the study, are independent and identically dis- 
tributed Gaussian random variables 



N K (li, E) , 



(2.1) 



with mean fi — (fii, . . . , fi K ) T and covariance matrix the posi- 
tive definite T = (a^)^ k , =l . In medical applications, the mean 
vector is often interpreted as the treatment effect. We wish to 
test the global null hypothesis of no treatment effect H 0 : fi — 
0 = (0, 0, ... , 0) r against the two-sided alternative H\ : fi ^ 0. 
Note that the methods which follow equally apply to the two- 
sample test with common covariance matrix, but we continue 
with the one-sample presentation to simplify notation. 

The observation vectors Yjj, i = 1,2, ... ,rij, of the y'th 
stage are projected on the nonzero weighting vector Wj — 
(Wj\, Wji, . . . , WjxY and the projection magnitudes form 
the linear combinations L ly = w^YjjJ = 1, 2, . . . , tij, j — 
1,2,...,/. The stagewise z and t statistics for testing Ho 
against H t using the random sample of linear combinations 
Lij, i = 1, . . . , tij, when X is either known or unknown, are 
respectively 

T T ■ 

(2.2) 



1/2 ' 



Sj/nj 



1/2 • 



Here, aj is the variance and Lj, sj are the sample mean and 
sample variance of the linear combination Lj, respectively. Un- 
der assumption (2.1), the stagewise z and t statistics, Zj, Tj, 
j = 1 , 2, . . . , / are respectively normally and noncentrally t 
distributed, Zj ~ N(8j, 1) and 7} ~ t Vj (9j) with location pa- 
rameter 



T 



{w]T.Wj) 



\l/2' 



(2.3) 



and Vj = rij — 1. Under H 0 , the z and t statistics are stan- 
dard normal and Student's t random variables, that is, Zj ~ 
N(0, 1) and Tj ~ f Vy . The two-sided stagewise p values of 
the z and ?-tests are, respectively, p z , — 2<t>(—\Zj\) and p t . — 
2ty Vj (-\Tj\), where <&(•) and *(•) are the cumulative distribu- 
tion functions of the standard normal and Student's f-distribution 
with Vj degrees of freedom, respectively. 

At the y'th analysis, j — 1,2,...,/, performed after the yth 
stage study, a combination function C(Pj) is used to combine 
the stagewise p values, Pj — (p { , . . . , p •), of stages 1 to j (pj 
either p z , or p t ). Rejection and acceptance critical values aij 
andarjj (0 < u\j < a < aoj < \,j = 1,2,..., /) are used to 
decide whether to stop the study early and either reject or accept 
Hq, respectively. Specifically, the /-stage sequential design has 
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the following form: 

At interim analysis 
j = 1,2,...,/- 1, 

if C(pj)< a, j, 

if C(pj) > a 0J , 

otherwise, 
At the final analysis J, 
if C( P j) < a UJ , 

otherwise, 



stop study and reject H 0 , 
stop study and accept Ho, 
continue to stage j + 1. 

stop study and reject Hq, 
stop study and accept Hq. 



(2.4) 



Several combination functions are proposed in the literature. 
Bauer and Kohne (1994) suggested the use of Fisher's product 
combination function 



c( Pj ) = Y\ Pl , 



(2.5) 



i=i 



while Lehmacher and Wassmer (1999) suggested the use of 
the inverse normal combination function. These two combina- 
tion functions are the most commonly used in the literature 
(Bretz et al. 2009). The formulation and results which follow 
use the Fisher's product function in (2.5), but our results equally 
apply to other combination functions including the inverse 
normal. 

Herein, we will refer to the /-stage tests with linear com- 
bination stagewise z and r-test statistics as the /-stage z and 
r-tests, respectively. The power function, that is, the probability 
to reject Hq, of the /-stage z or r-test is p = J2j=i Pj where, 
Pi = Pr(/?j <a ll ), the first stage and 

Pj = Pr(C( Pl ) e (a u , a 0>/ ) V / < j ; C( Pj ) < (2.6) 



the jth stage power functions, j = 2, 3, 



J w, 



either 



P z , p Zj or p t , p tj , respectively). The boundaries a, ., aoj are 
suitably chosen to satisfy the Type I error equation 

r a o,i r a 'o,2 r a 'o,j-i 
a = a i,i + Z^ / / ■■■/ oc[j dp j _ 1 ...dp 2 dp u 

(2.7) 

' 0J =a 0J /p 1 p 2 ... 



where a[j = a lj /p 1 p 2 ... Pj _ x , or; _ 

the conditional rejection and acceptance boundaries, respec- 
tively, of stage j, j = 2, 3, . . . , /. 

3. OPTIMAL /-STAGE zAND f-TESTS 

The crucial element for these /-stage linear combination z and 
r-tests are the stage-wise weighting vectors Wj . In this section we 
develop a methodology for optimally deriving these weighting 
vectors. The next lemma is the first step for computing the 
weighting vectors maximizing the power of the z and t- tests. 

Lemma 3.1. Under (2.1), the power of the /-stage z and r-tests 
in (2.4) with combination function as in (2.5) is nondecreasing 
in the absolute value of 9j in (2.3), 7 = 1,2,...,/. 

Note that it can be straightforwardly shown that the above 
result hold for both one-sided stagewise tests and for the inverse 
normal combination function. The proof of the above lemma is 
surprisingly complex because for some range of values of 9j 
an increase in \0j\ decreases the probability to continue to the 



next stage and therefore the power of the subsequent stages, 
y6 <7+1) = Yld=j+\ Ph decreases. In Supplementary Material A, 
we prove that even for these range of values of 1 0j | , the decrease 
(in absolute value) in P { i +l) is bounded above by the increase 
mPj. 

The above result, except for being crucial for deriving The- 
orem 3.1, can also be useful for more general settings of adap- 
tive designs. For example, Lemma 3.1 proves that if investi- 
gators wish to apply an adaptive z or r-test and are interested 
in maximizing the power of these procedures, they only need 
to sequentially maximize the location parameters of the stage- 
wise test statistics separately. For instance, suppose that one 
is willing to conduct an adaptive design study to explore the 
relationship between an observation variable Y with a set of 
covariates X described by Yj — Xjbj + ej, ej ~ N n (0, cr 2 I„), 
7 = 1,2,...,/, independent. Then, our results prove that to 
maximize the power of the /-stage test with stagewise statistics 
the classical z and r statistics, with respect to the experimen- 
tal design, it is sufficient to maximize X T -Xj, 7 = 1,2,...,/, 
which agrees with the standard practice of deriving optimal 
designs. 

Considering the /-stage linear combination z and r-tests, 
Lemma 3.1 implies that to maximize the power of these tests 
with respect to the weighting vectors Wj, it is sufficient to maxi- 
mize the value of 9j, j = 1, 2, . . . , /. Using this result, we next 
derive the power-optimal weighting vector. 

Theorem 3.1. Under (2.1), the power of the /-stage z and 
r-tests in (2.4) with combination function as in (2.5) are maxi- 
mized with respect to the weighting vectors Wj, j = 1,2,...,/, 
if and only if the latter are proportional to 



co* = T, /I. 



(3.1) 



The last result provides the optimal, in terms of power, weight- 
ing vector for the /-stage linear combination tests w*. In Section 
3 . 1 , we show that co* , which expresses the multivariate treatment 
effect standardized with respect to the variance matrix X, is 
central in characterizing the power of these tests. However, this 
optimal vector co* depends on the unknown parameters fi and X 
and therefore is also unknown. In the next section, we develop a 
methodology for selecting the weighting vectors wj in practice. 
We propose using the information for fi and E, available at each 
interim analysis, to optimally select Wj, j — 1,2,...,/, where 
optimality is expressed here in terms of predictive power. The 
source of this information is the data collected from the stages 
completed before each interim analysis, but also prior informa- 
tion extracted from previous studies and expert clinical opinion. 
Predictive power allows the incorporation of this information 
into our procedures in a natural and plausible way. Note that, as 
we also explain in the next section, if Equation (2.7) is satisfied, 
the Type I error of these tests is controlled. 

3.1 The Proposed z* and t* Tests 

Prior information, Zrj> is use d to inform standard conjugate 
multivariate priors for the observation mean and covariance 
matrix. We use the Gaussian-inverse- Wishart prior 



(H I E,2b) 
(X I 2b) 



N K (m 0 , E/« 0 )> 

raw(v 0 , Sq 1 ), 



(3.2) 
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where mo represents a prior estimate of the value of ft and no 
corresponds to the number of observations on which this prior 
estimate is based, while Vq and So respectively represent the 
degrees of freedom and the (positive definite) scale matrix of 
the inverse- Wishart prior. 

Under this standard Bayesian model (see Gelman et al. 2004), 
the posterior distribution of p and E given the information set 
Tj — {Jo, y(j)}> consisting of the prior information Iq and the 
data collected up to the y'th interim analysis = [yiy 2 ■ ■ ■ J^] 
is(ji | T,Xj) ~ N K (mj, T/n U) ME | Xj) ~ lW KxK ( Vj , SJ 1 ). 
Here, 



m j = 



n 0 mo + n U) y u) 



n 0 + n u) 
Sj = So + V(j)Sy w 



n o n (j) 
n 0 + n (j) 



(y u) -mo)(y U) -mo) 2 



(3.3) 



and V(j) — no + uq) — 1 with hq) — n\ + n-i + • • • + nj and 
= 5Z/=i Y^lLi y»l n U) respectively the sample size and 
sample mean of y^y Note that, due to the positive definite- 
ness of the prior estimates So, the posterior estimates Sj are 
also positive definite. Positive definiteness of So is required for 
our procedures to be applicable. 

We wish to use this information to select the weighting vectors 
Wj optimally. Optimality here is expressed in terms of predictive 
power of the test. Predictive power (Spiegelhalter, Abrams, and 
Myles 2002) in the present context is derived by averaging the 
power of the 7-stage z and f-tests over the distributions of the 
model parameters for a given information set. The predictive 
power for the first stage given the prior information set I 0 is 
B\ — Pr(p { < a^i | Iq) and for the y'th stage, j = 2, 3, . . . , /, 
given the information set2j_i is 

Xj-i s.t. C(p,) < a u 

for / e {1,2, ...J - 1}, 
Xj-i s.t. C(p,) > a 0 j 

for le {1,2,..., 7-1}, ( 3 .4) 



£ Pr (C(p v ) e (a u ,, a 0 j>), V < h 

1=3 

C( Pl ) < a u | Xj_ x \ 



otherwise. 



The next result presents the weighting vectors that we suggest 
to use for the stagewise linear combination z and f-tests. 

Theorem 3.2. Under (2.1) and (3.2), the y'th stage predictive 
power, B Zj , j = 1, 2, . . . , J, of the 7-stage z-test in (3.4) is 
maximized with respect to the weighting vector Wj if and only 
if Wj is proportional to 



w 7 



m 



7-1 ■ 



(3.5) 



Similarly, as we prove in Supplementary Material A, 
for — > oo, the y'th stage predictive power, B tj , j — 
1, 2, . . . , J, of the 7-stage f-test in (3.4) is maximized with re- 
spect to the weighting vector Wj if and only if Wj is proportional 
to 



Wf* — S: 



1»M. 



(3.6) 



where nij, Sj as in (3.3). The proposed 7-stage tests, henceforth 
called (adaptive) z* and f*-tests, proceed as follows: for the jth 



analysis, j — 1, 2, . . . , J, (i) obtain w z * or w z * using (3.5) or 
(3.6), (ii) set Wj equal to w z * or w z * and compute the stage 
j statistic Zj or Tj as in (2.2), (iii) calculate the stage j p- 
value, p Zj = 2<D(-|Zj|) or p t . = 2* y .(-|r / |), (iv) use all the 
observed /7-values to perform the combination test in (2.4). 

Importantly, the weighting vectors w z * and w t *, given the 
prior information and the observed (if any) data Jy-i), are fixed 
before collecting _y 7 and hence, under the standard conditions 
described in the following theorem, the Type I error of z* and 
f*-test, is preserved. 

Theorem 3.3. Under (2.1) and for a\j, otoj, j = 1, 2 , J 

satisfying Equation (2.7), the Type I error of the z* and f*-tests 
is preserved at the nominal a level. 

4. POWER CHARACTERIZATION (POC) 

To study the performance of a test, we primarily need to 
explore the relationship between its power function and the de- 
sign parameters. The latter might be, among others, the critical 
values, the sample size(s), and the model parameters. The crit- 
ical values and the sample size(s) are scalar and therefore it is 
straightforward to visualize power even across all their possi- 
ble values (e.g., using simulations). Their relation to power can 
then be easily described and understood. In univariate settings, 
this is also the case for the model parameters. However, in the 
multivariate setting, model parameters can be high-dimensional 
and therefore it is not practically feasible to visualize power 
over the whole design space. Power analysis is then typically 
restricted to a limited range of different structures of the model 
parameters. This might be sufficient for power analysis in spe- 
cific settings, but it has obvious limitations in considering the 
general behavior of a testing procedure. 

In the following, we encounter this problem in the context 
of linear combination tests and we provide a solution. We first 
consider the case of /-stage linear combination z and f-tests with 
fixed weighting vectors which, apart from providing a method 
for performing simple and efficient power analysis of tests such 
as the OLS test in O'Brien (1984, see Logan and Tamhane 2004; 
Pocock, Geller, and Tsiatis 1987; Tang, Geller, and Pocock 
1993 for earlier work), also provides the intuition for the results 
considering the z* and t * tests. Note that in Section 4, the critical 
values and sample sizes (including the "prior" sample sizes) are 
assumed to be fixed and described by the design vector d — 

(Qfo.i, «0,2, • • ■ , &0,J, dl,U «1,2, . . . , ai,j, vo, no.«i.---. «/)• 

To provide greater insight to the subsequent results, it is also 
worth noting the joint distribution of the stagewise linear com- 
bination z statistics, Zj, j = 1, 2, . . . , J, here for J — 2, 



Pr{Z x <zi,Z 2 <z 2 ) =J Pr{Z x < zu Z 2 < zi\y x ) AF{y x ) 

= f 4>{z2-Si(yi))dF{yi), 

•MJi:Zi<zi) 

where F (y{) the cdf of the first stage data, y lt and ^(Ji) the 
location parameter as in (2.3). The latter parameter is inde- 
pendent of y u that is 62 (y{) = 62, for the linear combination 
tests with fixed weighting vector, while for the adaptive z* 
and t* tests, ^(Ji) depends on y 1 through the weighting vec- 
tors in (3.5) or (3.6), respectively. The next section focuses on 
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characterizing further the effect of the weighting vector, through 
the parameters 6j, on the power function. Note that the power 
function can be easily derived from the joint distribution of 
the stagewise statistics by replacing Zj with suitable rejection 
or acceptance boundaries. In Supplementary Material A, we 
show that the above expression can be easily generalized to any 
J > 1 and that by replacing $(•) with the cdf of the Student's 
r-distribution ^(O, we can easily derive the joint distribution of 
T } J = 1,2,..., J. 

4.1 PoC for the J-Stage zand f-Tests With Fixed 
Weighting Vectors 

To compute the power of the /-stage z and r-tests with fixed 
weighting vectors Wj — w, it is sufficient to know the design 
vector d, as well as the stagewise location parameters Bj in (2.3) 
which in this case are also fixed, that is, 9j — 9. The latter can 
be reexpressed as 

6 = t ?l\v2 = T^T = " a * " cos(ang(u>, to*)), (4.1) 
(w 1 T,w) 1 ' 2 \\w\\ 

where ang(u; J , to*) denotes the angle, in measured radians at 
the origin, between the vectors w and to*. Here, 
to* — T, 1 / 2 ^)* = X -1 '' 2 //, are the standardized selected and op- 
timal weighting vectors. In particular, the latter expresses the 
standardized multivariate treatment effect, generalizing the uni- 
variate (K = 1) standardized treatment effect /x/cr . Considering 
the weighting vector selection problem, the first equation in (4. 1 ) 
implies that a weighting vector that increases the mean and/or 
decreases the variance of the linear combination gives higher 
power. The ambiguity in the latter expression becomes clearer 
by the standardization in the second equation which implies that 
the weighting vector selection can be expressed as a process of 
learning the standardized optimal weighting vector to*. 

The last equation in (4.1) establishes two scalar measures 
which are sufficient to determine power. The first is the magni- 
tude of to*, \\to*\\ = 0t r 5TV) 1/2 = D^-e, which is the Maha- 
lanobis distance between the distributions of the observation Fy 
under the null and the alternative hypotheses. The Mahalanobis 
distance is a generalization of the univariate signal-to-noise ra- 
tio and can be interpreted as a measure of deviation from the 
null hypothesis. In medical settings, it is a well-known global 
measure of the strength of the treatment effect. The second, 
cos(ang(u), to*)), is a measure of angular distance between the 
selected and the optimal weighting vector. It is a measure, in 
other words, of the distance of our weighting vector selection to 
the optimal choice. Under this representation, it becomes clear 
that, for fixed weighting vectors, the location parameter 0 is 
equal to a measure (D^x) of the strength of the treatment effect 
scaled down by a measure (cos(ang(u>, to*))) of the distance be- 
tween the parameters and their prior estimates. The last results 
are formally stated in the next theorem. 

Theorem 4.1. The design vector d, the Mahalanobis distance 
£V,£ = (/t r E~V) 1/2 and the angle ang(w*, w) between the 
vectors to* = T,~ l / 2 (i and w — T, l ^ 2 w are sufficient to deter- 
mine the power function ji of the /-stage linear combination z 
and r-tests with fixed weighting vectors Wj — w. 



4.2 PoC for the z* -Test 

The sequential adaptation of the weighting vector increases 
the complexity within the relation between the power function 
and the design parameters. However, following similar method- 
ology as above, analogous results can be derived. For this we 
use two steps, the first of which involves standardizing the pro- 
cedure, similarly to (4. 1), and the second establishing a rotation 
invariance property of the power function. The next lemma is a 
direct consequence of the standardization step summarizing fi, 
E, and mo to the vectors to* and w z *. 

Lemma 4.1. The design vector d, the standardized optimal 
weighting vector d>* — E _1 ^ 2 /t and the standardized first-stage 
weighting vector w z * in (3.5) are sufficient to determine the 
power function y6 z « . 

In the above result, we make use of the fact that the location 
parameter, Q z *, of the z*-test can be written as 

^z* — 7TZ — |T> Wz* — ■ , 

II w^s || 1 n 0 +n (j -i) 

*>Y w = -£- 1/2 Y (j )~N K (to*,I/n (j) ) (4.2) 

which implies that the adaptive selection of the weighting vec- 
tors can be reexpressed as a procedure of adaptive estimation of 
the vector to* . Under this standardization, we can proceed to the 
rotation-invariance step which results in the next lemma. 

Lemma 4.2. The power, f} z » , of the z*-test is invariant to rota- 
tions of the weighting vector u> z j around the optimal weighting 
vector to*. 

The idea behind Lemma 4.2 is that if w 7 > is rotated around 
to*, that is, w z * is replaced by w z * = Rw z * , where R is a rotation 
matrix with rotation axis w*, the rejection region of the test is 
changed. However, the new rejection region is simply a rotation 
of the initial rejection region. That is, for each point say u>j 
in the initial rejection region, we can find a unique point, say 
Wy w , in the rotated rejection region such that Wy — Rvby . 
Because the symmetrical Gaussian distribution of the obser- 
vations iby w ~ Nic(to*, I/tiQ)) remains unchanged under the 
rotation, the likelihood of the rejection region, that is, the power 
of the z*-test, remains the same. The next theorem is direct 
consequence of Lemmas 4.1 and 4.2. 

Theorem 4.2. The design vector d, the Mahalanobis distance 
D^x and the angle ang(w*, w z *) between the vectors w* and 
w z * are sufficient to determine the power function fi z * . 

The above theorem states that the dependence of the power 
function on the model parameters and their prior estimates is 
described by simply a scalar measure of the strength of the treat- 
ment effect and a scalar measure of distance between the param- 
eters and their prior estimates. It provides a sufficient description 
of power which is based on easily interpretable summaries and 
is considerably lower dimensional (importantly not depending 
on K, see Table 1). This allows us to perform power analysis of 
the adaptive /-stage z*-test in a simple way potentially covering 
the whole design space. 
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4.3 PoC for the f* Test 

The need to estimate the unknown T increases substantially 
the dimension and the complexity of the design space. The se- 
quential estimation of T , in addition to (i, to obtain the weighting 
vectors w t * , implies that the power analysis needs to account for 
both estimation procedures. For this, we write the weighting 
vector w t *, j = 1, 2, . . . , J in (3.6) as 



III; i = D, l W z *, 



(4.3) 



and w z * the y'th standardized weighting vector of the z*-test in 
(4.2). Here the X -deviation matrix Dj is a measure of devia- 
tion of the estimate Sj-\ in (3.3) from the parameter T. The 
weighting vector w,» is then written as a product of the inverse 
of the matrix Dj, that accounts for the estimation of T,, and the 
vector w z * which accounts for the estimation of fi, the latter 
taking £ as known. We next follow the same steps as in Section 
4.2 for deriving the PoC of the r*-test. The standardization step 
results in the next lemma summarizing ft and T and their prior 
estimates mo and So to the vectors w*, w z * and the matrix D\ 
that have clear interpretation. 

Lemma 4.3. The design vector d, the matrix D\ in (4.3) and 
the vectors w* and w z * are sufficient to determine the power 
function (3,* . 

Here, we use that the location parameter 6,* and the £- 
deviation matrix D , can be written as 



wLD^u* 

Z ,' J 



Dj = Di+v 0 -_i)S. 



+ 



« 0 «O-l) 



"0 + 

and that w z * can be written as the weighted average in (4.2). 
Here, Sg =S _1 ^ 2 S y(J) 2 _1/f2 is the covariance matrix of the 
sample w y , i — 1,2,...,/!;, / = 1, 2, . . . , j, where, impor- 
tantly, w Yil '= ST 1/2 y,-, ~ N K (cb*, I). 

In a similar fashion to the previous section, we next estab- 
lish the invariance of the power function under certain rotations 
of the prior estimates. For this, we define V — [v\ v 2 ... v K ] 
to be the matrix with columns the orthonormal eigenvectors 
of D\ and Ai = diag(Xi) the diagonal matrix with diago- 
nal A.i = (In, • • ■ , ^\k) t the vector of the corresponding 
eigenvalues (An > A21 > • • • > > 0). We can then write 
D x - VA X V T , w z * - Vc z *, and w* = Vc* where 

c z *,k = cos(ang(» t , w z *)), c* k = cos(ang(« fc , &>*)), 
k = 1,2, ...,K. (4.5) 

The rotation invariance property of the t *-test is described in the 
next lemma. 

Lemma 4.4. The power function f3 t * is invariant to simulta- 
neous rotations of the vector w z * and the eigenvectors of the 
matrix D\ around the optimal weighting vector to*. 



Table 1. Model and prior parameters of the z* and r*-tests, 
respectively, and their dimension 



Parameters 


Dimension 


Parameters 


Dimension 


fi,H,m 0 


(K 2 + 5K)/2 


li, T, m 0 , S 0 


K 2 + 3K 


&)*, U) Z J 


2K 


ft)*, W z *, Z>! 


K 2 +5K 
2 


ZVz. ang(&>*, «i z .) 


2 


c*, c z «,Xi 


3K 



The proof of Lemma (4.4) is similar to the proof of Lemma 
(4.2), albeit rather more complex. The next theorem is direct 
consequence of Lemmas 4.3 and 4.4. 

Theorem 4.3. The design vector d, the vector of eigenvalues 
Xi of the matrix D\ in (4.3), and the vectors c z * and c* in (4.5) 
are sufficient to determine the power function f3 t * . 

As we can see in Table 1 , the last result reduces the dimension 
of the design space of the ?*-test substantially, allowing us to 
explore power across the design space. While the design space, 
due to the covariance matrix estimation, still depends on K, it is 
reduced from order K 2 to order K. 

Furthermore, this reduction provides an understanding of how 
the selection of the weighting vector affects power. This be- 
comes clearer if we consider that 8 t > in (4.4) can be written 
as 



e t1 = 



7 = 1,2, 



where 



c z* — 

1 zio + «o-i) 

Aj = Ai + v 0 -_i ) S Cyo _ i) 

«0"O'-l) , 



+ 



no + «0-i) 



«)(« 



Here, Cy {j) and S c are the sample mean and sample co- 
variance matrix of the transformed observation vectors cy w = 
[cy 1 Cy 2 ■ ■ Cy ^ with Cy,, 1—1,2, the matrix with 

columnscy i7 = V\wy u ~ N K (c*, I), i — 1,2, ... , rij. The last 
expressions show that the distance of the prior estimates niQ, So 
to the model parameters fi, E can be expressed by the distances 
of the vectors c z * and X^ 1 — (1/Xn, . . . , 1 /Xik) t to c*, the lat- 
ter directly reflected to power through 0 t * (see the next section 
for more information). 

In the special case of the first stage E -deviation matrix being 
proportional to the identity matrix, that is, D\ oc / (An = A 12 = 
• • • = k\ K ), as the next result shows, the design space can be 
reduced further. 

Theorem 4.4. For D\ — c~ l I, the design vector d, the 
constant c, the Mahalanobis distance D^j., and the angle 
ang(u) z «, w*) are sufficient to determine the power function f3 t >. 

The last theorem proves that, for D\ oc I, we can use the 
fact that the prior T, -deviation matrix Di does not change the 
directions of w z * 's, to show that the relation of ft,* to the model 
parameters and their prior estimates can be described simply by 
the scalars D^ j, and ang(u) z », w*). In the next section, we use 
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Figure 1. Power (left panel) and RSSR (right panel) versus sample allocation ratio. We plot the sequential x 2 -test (magenta •<>•) and the 

z* (green line), sequential z (cyan — ), and z + (orange — ) tests with first stage/fixed/first step weighting vector 0 (x), 30° (o), 60° (□) 

and 90° (V) angle to the optimal. The remaining design parameters are J = 2, K = 10, a = 0.05, a l l = 0.01, a 0 l = 1, n T = 60, n a = 0.5«i, 
/Vz=0.65. 



this result and the results of Theorems 4.2 and 4.3 to perform 
power analysis studies. 

5. EMPIRICAL STUDIES 

To explore properties of the adaptive z* and r*-tests as well as 
alternative global tests and to perform comparisons, we present 
empirical studies making use of the results in Theorems 4.2, 
4.3, and 4.4. 

In addition to z* and f*-tests, we consider linear combina- 
tion z and r-tests with fixed weighting vectors, a class that in- 
cludes the OLS z and f-test in O'Brien (1984). We also consider 
the likelihood-ratio y 2 and Hotelling's / 2 -test with statistics 
X 2 = riYY,- x Y and T 2 = n(n - K)Y Sy 1 Y / K(n - 1) that fol- 
low the noncentral / 2 and F distribution with K and (K,n — K) 
degrees of freedom, respectively, and noncentrality parameter 
D 2 E . We consider both single stage and sequential /-stage de- 
signs for all these tests. Finally, the two-step, single-stage linear 
combination z + and t + tests proposed in Minas et al. (2012) are 
also considered. Note that the latter tests can be derived as spe- 
cial cases of the z* and f*-tests for J — 2, (ai,i, a 0 { ) — (0, 1) 
and C(p 2 ) — pi- 

A range of experiments are performed under different values 
of the design parameters. The power function of /-stage (/ > 1) 
tests is not analytically tractable and therefore power is approx- 
imated by the rate of rejections in a large number of simulated 
replications, here R — 10,000, of a single experiment. Further- 
more, to study the reduction in sample size due to early stopping 
of the study, we also empirically compute the rate of sample size 
reduction (RSSR), 

(n T - E(N)\ 
RSSR = 100 x f — —J %, 

where nj — «i + «2 + ■ ■ • + nj the total sample size, N the 
sample size used for a single replication of the study and E{N) 
its expected value. Note that single-stage tests have RSSR = 0, 
in contrast to sequential tests that allow for early stopping and 
thus have nonzero RSSR. 

5.1 Simulation Data Examples 

We next summarize the main results of a comprehensive study 
of the power behavior of the above tests in relation to the design 
parameters (more illustrations are included in Supplementary 



Material B). First, larger values of D^^ and/or m result in 
higher power values for all tests considered, except the z and t- 
tests with fixed weighting vectors w orthogonal to 5f for which 
fi — a. Considering the prior sample size, the results indicate 
that for «o G (0.5«i, 0.75«i) the prior estimates become influ- 
ential, but they do not dominate the accumulated data when 
selecting the weighting vector while larger values of n 0 en- 
forces z* and t* to have more similar behavior to z and r-tests 
with fixed weighting vector. Furthermore, simulation examples 
confirm that larger values of the acceptance critical values aoj 
increase the power of multistage tests especially for larger po- 
tential power gain in subsequent stages, at the expense of less 
chance of early acceptance. Simulation examples also confirm 
that larger power is gained if larger rejection critical values a\ j 
are allocated to stages with larger potential power gain, while 
the value of RSSR increases for larger aij in early stages. 

We also consider power behavior related to allocation of sam- 
ple size to stages (Figure 1). For the sequential z and / 2 -test, the 
results show that higher power is achieved if sample allocation 
is analogous to cn-rate allocation. The z* and ?*-tests generally 
attain higher efficiency for close to balanced allocations. For 
w z * close to (far from) the optimal &>*, slightly higher power is 
attained for assigning more sample to early (late) stages. Small 
to moderate allocation ratios r are more appropriate for the z + 
test since no a rate is spent in the first stage. Further, as in the 
X 2 -test, the z* achieves higher RSSR for r — 0.5. 

Before we proceed to comparisons, it is worth consider- 
ing the impact of T, being unknown and thus estimated on 
the performance of the f*-test. First, in the case of D\ oc I 
(A.i oc 1 = (1, 1, . . . , l) r ), which as we show in Theorem 4.4 is 
somewhat easier case to consider, the I! estimation variability 
is substantially reduced and thus we generally expect w t : to be 
closer to w z *. On the other hand, if D\ <jk. I (\\ <jk. 1), the direc- 
tion of \\ is more influential on w t * with the consequence being 
double-edged (see Figure 2). That is, compared to the situation 
of Xi oc 1, the distance of ui^'s to optimal can be larger (left 
panel) but also smaller (right panel) depending on how close 
the direction of A.]" 1 = (1/An, . . . , 1/Xik) t is to the optimal 
direction c* . 

Finally, it is useful to note that throughout our simulations of 
f*-test, the cos(ang(c*, Aj~ c z »)) is shown to be a robust sum- 
mary, albeit not sufficient (see Supplementary Material B, Fig- 
ure 7, Section 2. 1), of the distance between the model parameters 
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Figure 2. Power of the f*-test versus Mahalanobis distance for various c*, c z *, 3Li . In the left panel, the vectors c* = c z * oc 1 while in the 

right panel c* = e\ = (1, 0 0) r and c z j oc 1 which, for i-i = 1 (green — x— line), give ip — ang(c*, A~[ l c z *) = ang(c*, A-j" 1 ) = 0° and 72°, 

respectively. In both panels, Xi at 1 are also chosen to give <p — 25° (dark green — o— line), 45° (dark green — I — line) and 65° (dark green 
—<>— line). The remaining design parameters are J — 2, K — 10, a — 0.05, a 1 j = 0.01, a 0 1 — 1, n T — 20, r — 0.5, « 0 = 0.75wi, v 0 = n Q — 1. 



and their prior estimates. For this reason, but also to reduce com- 
plexity, in the comparisons to follow, we focus on the case of 
k\ oc 1 (particularly, as we explain later on, in cases resembling 
the right panel of Figure 2), for various values of the summary 
cos(ang(c*, A~ l c z » { )). 

In terms of comparisons, first note that, for fixed design 
parameters, single-stage tests attain higher power levels than 
multi-stage tests, nevertheless at the expense of not allowing for 
early stopping and thus not allowing for sample size reduction 
(RSSR = 0). Furthermore, it might be useful to emphasize that 
for fixed design parameters, the power of the linear combination 
test with weighting vector (either fixed or initial) set equal to the 
optimal weighting vector co* attains the maximum power and 
provides an upper bound to all the other presented procedures, 
including Hotelling's T 2 -test as proved in Minas et al. (2012) 
(Corollary 1). Compared to the z-tests with fixed weighting vec- 



tors id, as we can see in Figure 3, the adaptive z* lose some power 
for w (= w z * ) close to optimal but gains substantial amounts of 
power for w far from optimal, importantly avoiding the problem 
of z-tests having zero power for w orthogonal to optimal. This 
result emphasizes that, even though the power of the proposed 
tests remains sensitive to the prior information used to select 
the weighting vector, they are less sensitive to the initial selec- 
tion of the weighting vector than the z and f-tests, where the 
weighting vector is fixed. The adaptive z*-test also has substan- 
tially higher power to z + for small angles to the optimal and 
slightly lower power for large angles. Finally, the power of the 
single-stage and sequential / 2 -tests is approximately equal to 
the power of the z*-test for w z * having respectively 60° and 45° 
angle with &>* . Note that, as the results in Figure 3 confirm, all 
the considered tests control the Type I error at the nominal level 
a = 0.05. 
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Figure 3. Power and RSSR versus Mahalanobis distance. We plot the z*-test (green ) with the tests z + (orange — .) (up left), sequential 

z (cyan — ) and x 2 (magenta •<>•) (up right), single stage z (blue — ) and x 2 (red •<>•) (down left) and sequential x 2 (down right). The linear 
combination z*/z/z + tests are performed with first stage/fixed/first step weighting vectors having 0 (x), 30° (o), 60° (□), and 90° (V) angle to the 
optimal. The remaining design parameters are J = 2, K — 10, a — 0.05, a { j = 0.01, a 0 { — I, n T — 30, r — 0.5, n 0 — 0.75«i, v 0 = n 0 — 1. 
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Figure 4. Power and RSSR versus the total sample size n T . We plot the ?*-test (green ) with the tests, t + (orange — .) (up left), sequential 

t (cyan — ) and T 2 (magenta ■<>■) (up right), single stage t (blue — ) and T 2 (red ■<>■) (down left) and sequential T 2 (down right). The linear 
combination t*/t/t + tests are performed with first stage/fixed/first step weighting vectors having 0 (x), 30° (o), 60° (□), and 90° (V) angle to 
the optimal. The remaining design parameters are K — 15, J = 2, a — 0.05, a 1 { — 0.01, a 0 l — 1, r — 0.5, rig — 6, v 0 — n 0 — 1, D^ y. — 0J. 



In the case of E unknown, we consider comparisons for the 
caseofDi = / which, using the results of Theorem 4.4, they can 
be performed in a similar way to the case of known £. For the 
simulations in Figure 4, the case of D\ — I can be thought of as 
representative of A.^ 1 fairly distant to c* (right panel of Figure 2), 
since we take c* — e\ resulting in cos(ang(c*, A.]" 1 )) = ~J~K /K 
(=0.26, angle 75°, for K — 15). As we would expect, the power 
of all tests is lower than their counterparts for £ known (same 
design parameters), but the patterns of power difference across 
tests remain the same except from Hotelling's T 2 which in 
contrast to / 2 -test is highly dependent on the sample size. 

As Figure 4 illustrates, for n T < K or n T slightly larger than K 
(here, M7- = 10— 30 for A" = 15), T 2 is respectively inapplicable 
or very inefficient with power levels lower than the power of t* 
even for angles close to orthogonal. As sample size becomes 
considerably bigger than K (rij > 50), the power of T 2 -test 
increases sharply to yield power levels analogous to the x 2 -test. 
For instance, for the design parameters in Figure 4, the single 
stage and sequential T 2 -tests, likewise to the / 2 -test, have power 
close to the power of the t* for angle 60° and 45°, respectively, 
for large sample sizes. 

6. APPLICATION TO AN EEG STUDY 

We consider applications to an electroencephalogram (EEG) 
study, the results of which are provided in Lauter, Glimm, and 
Kropf (1996). As Lauter et al. described, the data are collected 
from rij — 19 depressive patients at the beginning and at the 
end of a six week therapy. For demonstration, K — 9 variables 
are used which represent the changes of the absolute theta power 
in channels 3-8, 17-19 of EEG during the therapy of each pa- 
tient. In Table 2, we present the means, standard deviations, and 



correlation matrix of the data. Note that although an increase 
is indicated in all channels, none of them (mini p^ — 0.04) fall 
below the Bonferroni corrected threshold a/K = 0.0056 at the 
a — 5% significance level. Hotelling's T 2 -test also fails to re- 
ject H 0 (p T i — 0.261). On the contrary, the SS and PC r-tests 
proposed by Lauter et al. reject Hq at the 5% significance level 
(p ss = 0.0489, p YC = 0.0487). 

We perform power analysis by setting the design parameters 
as in the above study, that is, rij — 19, K — 9, fi — y, T = S y , 
a = 0.05. For these design parameters, the power of Hotelling's 
T 2 is p T i = 0.68 (£V,e = 1.15). This is larger than the power 
of the SS and PC tests which are respectively /3 lss = 0.52, 
P, pc =0.51 (the contrasting results of the tests performed us- 
ing these data are because of the different shape of the t and 
F distributions). The latter power values are very close to the 
power of the OLS f-test in O'Brien (1984), ;0, OLS = 0.52, which 
uses the uniform weighting vector ujols 1- This gives angle 
ang(ii)oLS> &*) = 71°. Taking into account that the single-stage 
f-test for a weighting vector equal to the optimal has power 
fit = 1, we can easily see that there is considerable scope for 
improvement. 

Since the study was performed, there has been consider- 
able research into EEG studies on depressive patients. There 
is now literature (see, e.g., Davidson et al. 2002) indicating that 
left-frontal hypoactivation and right-frontal hyperactivation are 
present in such subjects. This would indicate that a nonuniform 
prior over these frontal regions should be used. Using prior 
information based on such evidence, the adaptive f*-test can 
attain high power levels. For example, the prior estimates given 
in Table 2 are in agreement with the evidence in the literature 
and further, the prior correlation structure is set to be roughly 
coherent to the distances between the channels, that is, larger 
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Table 2. Means, standard deviations, correlations, and their prior estimates for the EEG depression study presented in Lauter, Glimm, and 

Kropf(1996) 



ch. 


3 


4 


5 


6 


7 


8 


17 


18 


19 


h 


0.8710 


1.5890 


1.0370 


1.1460 


0.8510 


0.8530 


1.4220 


0.7510 


0.9950 


mo.k 


0.5 


3.50 


1 


2 


2 


2 


2 


2 


2 


s 0,k 


2.9494 


3.5121 


2.3637 


2.2490 


2.2760 


2.0706 


3.2624 


2.6382 


2.3593 


1.5 


2.5 


1 


2 


2 


2 


2 


2 


2 


Ro\R y 


1 


0.9262 


0.8115 


0.7959 


0.5786 


0.4902 


0.9323 


0.4896 


0.5312 


4 


0.8 


1 


0.6270 


0.7835 


0.3357 


0.4450 


0.9313 


0.2778 


0.4892 


5 


0.8 


0.7 


1 


n toot 

0.7882 


0.8492 


0.7173 


0.7347 


r\ H 1 A c 

0.7145 


a "7/: i 1 

0.761 1 


6 


0.7 


0.8 


0.7 


1 


0.6020 


0.7924 


0.8180 


0.6334 


0.7783 


7 


0.5 


0.4 


0.7 


0.55 


1 


0.6155 


0.4639 


0.6833 


0.5992 


8 


0.4 


0.5 


0.55 


0.7 


0.6 


1 


0.5177 


0.5983 


0.7833 


17 


0.9 


0.9 


0.75 


0.75 


0.45 


0.45 


1 


0.4048 


0.5711 


18 


0.45 


0.45 


0.65 


0.65 


0.7 


0.7 


0.5 


1 


0.4445 


19 


0.75 


0.75 


0.8 


0.8 


0.65 


0.65 


0.8 


0.7 


1 



distances have smaller correlations, with larger correlations set 
at the highly active frontal regions (in accordance with the 
literature). 

This prior estimate gives ang(u> t * , to*) — 37.27° which is 
much smaller than the angle under the uniform weighting 
vector. For a two-stage design (J = 2), with balanced sam- 
ple allocation, m — 10, «2 = 9, and a allocation a 1 j = 0.01, 
a 2 = 0.0087, no early acceptance allowed, a 01 — 1, prior sam- 
ple size riQ — 1 — 0.7« i, vo = 6 (see previous section) and the 
remaining design parameters as the original study, the f*-test 
has power ft. = 0.84 with RSSR = 22.3% (E(N) = 15). Sub- 
stantial power improvement is also obtained over the t + which, 
for no — 6, ri\ — 13, «2 = 6 (r = 0.3) and the remaining design 
parameters as above, has power ft+ = 0.64. 



7. DISCUSSION 

The methods developed in this work demonstrate that lin- 
ear combination tests provide a substantial alternative to the 
classical Hotelling's T 2 global test, especially in the setting, 
commonly encountered in recent important applications of clin- 
ical neuroscience, of the available sample size n being small 
compared to the observation dimension K. It is also shown 
that adaptive linear combination tests provide power robustness 
across the set of alternative hypotheses since they can correct 
initial selections of the weighting vector which are far from the 
optimal selection. The adaptive /-stage z* and t* -tests achieve 
high power levels for large n, independently of the initial selec- 
tion of weighting vector, but most importantly they can achieve 
high-power performance even if n is limited. 

The proposed tests achieve optimality in the sense of max- 
imizing the predictive power of the test at each interim anal- 
ysis. Predictive power has been used for sample size calcula- 
tion (O'Hagan and Stevens 2001), treatment selection (Kimani, 
Stallard, and Hutton 2009) and to select the component-wise 
significance levels in multiple testing (Westfall, Krishen, and 
Young 1998). It is a useful tool for incorporating prior infor- 
mation into the design of a study, particularly as such studies 
can often be viewed as a decision-making process. The appli- 
cation in Section 6 provides an example of a setting in which 



prior information is available and can substantially improve the 
performance of existing tests. 

Optimality is attained in our methods without undermining 
the two main targets of adaptive designs: flexibility and test 
specificity. This allows for future developments of the proposed 
test to consider further optimal design adaptations. The use 
of other adaptive designs techniques, such as sample size re- 
assessment, within our methodology can improve further the 
performance of the proposed tests. 

The power characterization in Section 4 provides a tool for 
understanding and alleviating to some extent the complexities 
of multivariate tests especially those based on response dimen- 
sion reductions. The possibly high-dimensional model param- 
eters and their prior estimates are reduced to low-dimensional 
summaries which are still sufficient to compute power. Impor- 
tantly, these summaries have interpretations directly related to 
the strength of the treatment effect and the effect of the dimen- 
sion reduction on power. They provide a method for performing 
simple power analysis, but also understanding the behavior of 
linear combination tests. 

The methods used to derive the power characterization are 
also interesting in their own right. They can be generally de- 
scribed by two steps: standardization and rotation invariance. 
The first standardization step is a prevalent technique for reex- 
pressing statistical models in the standard deviation unit and 
eliminating correlations. Here, it allows us to reexpress the 
weighting vector selection, which involves estimating the un- 
known model parameters, as a procedure of learning a single 
vector, that is, the optimal weighting vector. The second step of 
establishing a rotation invariance property for the power func- 
tion allows us to identify the measure quantifying the angular 
distance between the selected and the optimal weighting vector, 
reducing further the design space. The question whether these 
results can be derived under more relaxed modeling assumptions 
is an area of ongoing research. 

SUPPLEMENTARY MATERIALS 

Additional supplementary material is provided in the follow- 
ing documents: 
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Supplement A: Technical results Technical details, lemmas, 
and proofs. 

Supplement B: Extended simulation examples Examples 
from the extensive simulation studies performed to study the 
power of the considered tests. 
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